Overview
This section provides an overview of the imported dataset. Dataset statistics, variable types, a missing data profile and potential alerts are shown below.
| Discrete variable | 23 |
| Continuous variable | 4 |
| All missing variable | 0 |
| exitus_dt has 181090 (90.5%) missing values |
|
| dose_3_brand_cd has 181236 (90.6%) missing values |
|
| dose_3_dt has 181164 (90.6%) missing values |
|
| fully_vaccinated_dt has 183793 (91.9%) missing values |
|
| The variable ‘person_id’ does not have all unique values | Number of duplicate values: 9999 |
|
Variables
This section provides more detailed information per variable in the imported dataset.
Class of the variable: character
More than 100 distinct values
More than 100 distinct values
Class of the variable: character
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Class of the variable: integer
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 181090 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
More than 100 distinct values
More than 100 distinct values
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: logical
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 28937 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 181164 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: Date
More than 100 distinct values
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 183793 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical
Compliance with the Common Data Model specification
We check whether the imported dataset complies with the data model specification (https://docs.google.com/spreadsheets/d/1Eva2ucg_M0WaDkCaF7qfBxk2DwTlUac9gKuP3xck4rw/edit#gid=0).
To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.
| Validation rule | Name rule | Items | Passes | Fails | Percentage of fails | Number of NAs | Percentage of NAs | Error | Warning |
|---|---|---|---|---|---|---|---|---|---|
| is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) | V01 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(age_nm) | age_nm - 18 >= -1e-08 & age_nm - 115 <= 1e-08 | V02 | 200000 | 170384 |
|
14.81% | 0 |
|
|
|
| is.na(age_cd) | age_cd %vin% c(“0-18”, “18-25”, “25-35”, “35-45”, “45-55”, “55-65”, “65-75”, “75-85”, “85-95”, “95-105”, “105-115”) | V03 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(exitus_bl) | exitus_bl %vin% c(TRUE, FALSE) | V04 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(education_level_cd) | education_level_cd %vin% c(“Low”, “Middle”, “High”) | V05 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(income_category_cd) | income_category_cd %vin% c(“Low”, “Middle”, “High”) | V06 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(migration_background_cd) | migration_background_cd %vin% c(“NATIVE”, “EU”, “NON-EU”, “PAR”) | V07 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(household_type_cd) | household_type_cd %vin% c(“ALONE”, “COUPLE”, “COUPLE_CHILD”, “LONE”, “EXTENDED”, “OTHER”) | V08 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(hospi_due_to_covid_bl) | hospi_due_to_covid_bl %vin% c(TRUE, FALSE) | V09 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(test_positive_to_covid_nm) | test_positive_to_covid_nm - 0 >= -1e-08 & test_positive_to_covid_nm - 50 <= 1e-08 | V10 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(test_nm) | test_nm - 0 >= -1e-08 & test_nm - 50 <= 1e-08 | V11 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V12 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V13 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(dose_3_brand_cd) | dose_3_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) | V14 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 | V15 | 200000 | 200000 |
|
0% | 0 |
|
|
|
| (is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) | V16 | 200000 | 110188 |
|
44.91% | 0 |
|
|
|
| (is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) | V17 | 200000 | 190007 |
|
5% | 0 |
|
|
|
| is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt | V18 | 200000 | 199194 |
|
0.4% | 0 |
|
|
|
| (!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) | V19 | 200000 | 163140 |
|
13.66% | 9537 |
|
|
|
| is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V20 | 200000 | 190488 |
|
4.76% | 0 |
|
|
|
| is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V21 | 200000 | 175638 |
|
12.18% | 0 |
|
|
|
| is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) | V22 | 200000 | 195618 |
|
2.19% | 0 |
|
|
|
The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’